Calculating classifier calibration performance with a custom modification of Weka
نویسندگان
چکیده
Calibration is often overlooked in machine-learning problem-solving approaches, even in situations where an accurate estimation of predicted probabilities, and not only a discrimination between classes, is critical for decision-making. One of the reasons is the lack of readily available open-source software packages which can easily calculate calibration metrics. In order to provide one such tool, we have developed a custom modification of the Weka data mining software, which implements the calculation of Hosmer-Lemeshow groups of risk and the Pearson chi-square statistic comparison between estimated and observed frequencies for binary problems. We provide calibration performance estimations with Logistic regression (LR), BayesNet, Naïve Bayes, artificial neural network (ANN), support vector machine (SVM), knearest neighbors (KNN), decision trees and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) models with six different datasets. Our experiments show that SVMs with RBF kernels exhibit the best results in terms of calibration, while decision trees, RIPPER and KNN are highly unlikely to produce well-calibrated models.
منابع مشابه
Comparison of Classifier Algorithms in the Identification of Polypharmacy and Factors Affecting it in the Elderly Patients
Introduction: Prescribing and consuming drugs more than necessary which is known as polypharmacy, is both waste of resources and harm to patients. Polypharmacy is especially important for elderly patients; therefore, the factors affecting it must be identified and analyzed properly. Method: In this retrospective study, first, several classifier algorithms, i.e., C4.5, SVM, KNN, MLP, and BN for ...
متن کاملComparison of Classifier Algorithms in the Identification of Polypharmacy and Factors Affecting it in the Elderly Patients
Introduction: Prescribing and consuming drugs more than necessary which is known as polypharmacy, is both waste of resources and harm to patients. Polypharmacy is especially important for elderly patients; therefore, the factors affecting it must be identified and analyzed properly. Method: In this retrospective study, first, several classifier algorithms, i.e., C4.5, SVM, KNN, MLP, and BN for ...
متن کاملComparative Assessment of the Performance of Three WEKA Text Classifiers Applied to Arabic Text
This research is conducted in order to compare the performance of three known text classification techniques namely, Support Vector Machine (SVM) classifier, Naïve Bayes (NB) classifier, and C4.5 Classifier. Text classification aims to automatically assign the text to a predefined category based on linguistic features, and content. These three techniques are compared using a set of Arabic text ...
متن کاملDiscretizing Continuous Features for Naive Bayes and C4.5 Classifiers
In this work, popular discretization techniques for continuous features in data sets are surveyed, and a new one based on equal width binning and error minimization is introduced. This discretization technique is implemented for the UCI Machine Learning Repository [7] dataset, Adult database and tested on two classifiers from WEKA tool [6], NaiveBayes and J48. Relative performance changes for t...
متن کاملA Data Mining Approach for Precise Diagnosis of Dengue Fever
Dengue is a eviscerate disease common in tropical countries. It is also known as break-bone fever. Dataset for dengue gives information about the patient suffering with the dengue disease. The Dataset consist of attribute like fever, bleeding, metallic taste, Fatigue. The main objective of this study is to calculate the performance of various classification Techniques and compare their performa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014